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Abstract 

ManjQ applications in machine learning and data mining require computing 
pairwise l p distances in a data matrix A 6 R 71 * 23 . For massive high-dimensional 
data, computing all pairwise distances of A can be infeasible. In fact, even storing 
A or all pairwise distances of A in the memory may be also infeasible. 

For < p < 2, efficient small space algorithms exist, for example, based 
on the method of stable random projections, which unfortunately is not directly 
applicable to p — 3, 4, 5, 6, ... This paper proposes a simple method for p = 2, 4, 
6, ... We first decompose the l p (where p is even) distances into a sum of 2 marginal 
norms and p — 1 "inner products" at different orders. Then we apply normal or 
sub-Gaussian random projections to approximate the resultant "inner products," 
assuming that the marginal norms can be computed exactly by a linear scan. 

We propose two strategies for applying random projections. The basic projec- 
tion strategy requires only one projection matrix but it is more difficult to analyze, 
while the alternative projection strategy requires p — 1 projection matrices but its 
theoretical analysis is much easier. In terms of the accuracy, at least for p = 4, the 
basic strategy is always more accurate than the alternative strategy if the data are 
non-negative, which is common in reality. 

1 Introduction 

This study proposes a simple method for efficiently computing the l p distances in a 
massive data matrix A € M. nxD foip > 2 (where v is even), using random proiections fS2\ 

While many previous work on random projections focused on approximating the '2 
distances (and inner products), the method of symmetric stable random projections\% , 
|T3]|T8l[T5l is applicable to approximating the l p distances for all < p < 2. This work 
proposes using random projections for p > 2, a least for some special cases. 

'First draft Dec. 2007. Slightly revised June 2008. 
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Machine learning algorithms often operate on the l p distances of A instead of the 
original data. A straightforward application would be searching for the nearest neigh- 
bors using l p distance. The l p distance is also a basic loss functions for quality mea- 
sure. The widely used "kernel trick," (e.g., for support vector machines (SVM)), is 
often constructed on top of the l p distances[21 10 

Here we can treat pas a tuning parameter. It is common to take p = 2 (Euclidian 
distance), or p = oo (infinity distance), p — 1 (Manhattan distance), or p = (Ham- 
ming distance); but in principle any p values are possible. In fact, if there is an efficient 
mechanism to compute the l p distances, then it becomes affordable to tune learning 
algorithms for many values of p for the best performance. 

In modern data mining and learning applications, the ubiquitous phenomenon of 
"massive data" imposes challenges. For example, pre-computing and storing all pair- 
wise l p distances in memory at the cost 0(n 2 ) can be infeasible when n > 10 6 (or 
even just 10 5 )j5|. For ultra high-dimensional data, even just storing the whole data 
matrix can be infeasible. In the meanwhile, modern applications can routinely involve 
millions of observations; and developing scalable learning and data mining algorithms 
has been an active research direction. One commonly used strategy in current practice 
is to compute the distances on the fly[5 |, in stead of storing all pairwise l p distances. 

Data reduction algorithms such as sampling or sketching methods are also popular. 
While there have been extensive studies on approximating the l p distances for < p < 
2, p > 2 can be useful too. For example, because the normal distribution is completely 
determined by its first two moments (mean and variance), we can identify the non- 
normal components of the data by analyzing higher moments, in particular, the fourth 
moments (i.e., kurtosis). Thus, the fourth moments are critical, for example, in the 
field of Independent Component Analysis (ICA iQjJ. Therefore, it is viable to use the 
l p distance for p > 2 when lower order distances can not efficiently differentiate data. 

It is unfortunate that the family of stable distributions\2A\ is limited to < p < 2 
and hence we can not directly using stable distributions for approximating the l p dis- 
tances. In the theoretical CS community, there have been many studies on approximat- 
ing the l p norms and distancesll2lTOI^[nil^l7lfl^l2^ffll23l[T4l. some of which also 
applicable to the l p distances (e.g., comparing two long vectors). Those papers proved 
that small space (0(1)) algorithms exist only for < p < 2. 

1.1 The Methodology 

Given a giant data matrix A € R" x£) , we assume that a linear scan of the data is 
feasible, but computing all pairwise interactions is not, either due to computational 
budget constraints or memory limits. Also, we only consider evenp = 4, 6, among 
which p = 4 is probably the most important. 

Interestingly, our method is based only on normal (or normal-like) projections. The 
observation is that, when p is even, the l p distance can be decomposed into marginal 

2 It is well-known that the radial basis kernel using the l v distance with < p < 2 satisfies the Mercer's 
condition. However, we can still use the l v distance with p > 2 as kernels, although in this case it is not 
guaranteed to find the "most optimal" solution. For very large-scale learning, we usually will not find the 
"most optimal" solution any way. 



2 



Ip norms and "inner products" of various orders. For example, for two £>-dimensional 
vectors x and y, when p = 4, then 

d {P) = & - v* \ p = E < + E y< + 6 E ^ - 4 E ^ - 4 E 

i— 1 i— 1 i— 1 i— 1 i— 1 i— 1 

Since we assume that a linear scan of the data is feasible, we can compute ^,f =1 xj and 
12iLi Vi exactly. We can approximate the interaction terms Yli=i x \v1^ Si=i x \vu 
and J^iLi x iVi usm g normal (or normal-like) random projections. Therefore, for p 
being even, we are able to efficiently approximate the l p distances. 

1.2 Paper Organization 

Section |2 concerns using normal random projections for approximating I4 distances. 
We introduce two projection strategies and the concept of utilizing the marginal norms 
to improve the estimates. Section [3] extends this approach to approximating Iq dis- 
tances. Section [4] analyzes the effect of replacing normal projections by sub-Gaussian 
projections. 

2 Normal Random Projections for p = 4 

The goal is to efficiently compute all pairwise l p (p = 4) distances in A £ K"*- . It 
suffices to consider any two rows of A, say x and y, where x, y £ M. D . We need to 
estimate the l p distance between x and y 

D 

d (p) = E \ x * ~ y ^ p - 

i=l 

which, when p = 4, becomes 

D D D D D D 

d w = E \ x * - v* i 4 = E x i + E + 6 E x2 *£ - 4 E x *y* - 4 E x ^ ■ 

i—1 i—1 i—1 i—1 i—1 i—1 

In one pass, we can compute YliLi x t anc ^ SiLi Vi easily, but computing the inter- 
actions is more difficult. We resort to random projections for approximating X}£=i x iUi> 
Y^fLi x iUi' an d Si=i x iVi- Since there are three "inner products" of different orders, 
we can choose either only one projection matrix for all three terms (the basic projection 
strategy), or three independent projection matrices (the alternative projection strategy). 
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2.1 The Basic Projection Strategy 

First, generate a random matrix R e M. Dxk (k <C D), with i.i.d. entrie^l from a 
standard normal, i.e., 

rij ~N(0,l), E( rij )=0, E(r?.)=l, E(r|)=3. 
E (rf^r*^-/) =0, if £ or s is odd, and i ^ i' or j ^ f 

Using random projections, we generate six vectors in k dimensions, u\, u-x, U3, V\, 
v 2 , v 3 G R fc : 

D D D 

i—1 i—1 i—1 

D D D 

i—1 i—1 i—1 

We have a simple unbiased estimator of du\ 

d d ^ 

d(4) = E ^ + E y ^ + ^ ( Qu l v 2 - 4u3«i - 4u[v 3 ) . 

i=l i=l 

Lemma 1 

36 f D ° ( D ^ 

Var (d W ) =- [J2 < E Vi + E ^ 

y j=i j=i \j=i y 

\i=l i=l \i=l / I 

V i=l i=l \i=l / / 

/ D D D D \ 

A4 = -t E^E^+E^E^ 2 
\t=i (=1 1=1 1=1 j 

/ D D D D \ 

-~k E E ^ 5 + E ^f? E ^ »i 

\i=i i=i i=i t=i / 

32 / D D D D \ 

\i=l i=l i=l i=l ) 



3 It is possible to relax the requirement of i.i.d samples. In fact, to prove unbiasedness of the estimates 
only needs pairwise independence, and to derive the variance formula requires four-wise independence. 
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Proof 1 See Appendix^ □ 



The basic projection strategy is simple but its analysis is quite involved, especially 
when p > 4. Also, if we are interested in higher order moments (other than variance) 
of the estimator, the analysis becomes very tedious. 

2.2 The Alternative Projection Strategy 

Instead of one projection matrix R, we generate three, R( a \ R^ b \ R( c \ independently. 
By random projections, we generate six vectors in k dimensions, ui, U2, v.3, v\, i>2, 
v 3 e K fc , such that 

D D D 

E(c) 2 (a) 3 (b) 

i—1 i—1 i — 1 

D D D 

E(b) ST^ 2 (a) 3 (c) 

i—1 i—1 i—1 

Here we abuse the notation slightly by using the same u and v for both projection 
strategies. 

Again, we have an unbiased estimator, denoted by d^ a 

D D - 

d (4),a = E X i + E y * + I ( 6u 2«2 - 4 U Jvi - &u[v 3 ) 



i=l i=l 



Lemma 2 



36 ( ° D ( ° 

Var (d (4) , a J =— ( Y A J2vt + ( E x *y* 

i__ 1 i—i \ i—i 

E^E^+IE 



16 

T 



16 

T 



i—1 i—1 \z=l 



Proof 2 The proof basically follows from that of Lemma\l\ 
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Compared with Var (d(4)^ in LemmaQ] the difference would be A4 

Var (d^J - Var (d (4 ),a) = A 4 



4 

D 

■", .7; 



i=l 



' D D D D \ 

E x ? E f «■ + E E ^ I 

i—1 i—1 i—1 / 



•f E^'E^-E^E^; 1 • a) 

\i=i ^=1 i=\ i=i j 

which can be either negative or positive. For example, when all Xi's are negative and 
all yi's are positive, then A4 > 0, i.e., the alternative projections strategy results in 
smaller variance and hence it should be adopted. 

We can show in Lemma [3] that when the data are non-negative (which is more 
likely the reality), the difference in ([]]) will never exceed zero, suggesting that the 
basic strategy would be preferable, which is also operationally simpler (although more 
sophisticated in the analysis). 

Lemma 3 If all entries of x and y are non-negative, then 

Var (d (4) ) - Var (d (4 ),a) = A 4 < 0. (2) 
Proof 3 See Appendix\E[ □. 

Thus, the main advantage of the alternative projection strategy is that it simplifies 
the analysis, especially true when p > 4. Also, analyzing the alternative projection 
strategy may provide an estimate for the basic projection strategy. For example, the 
variance of d^ a is an upper bound of the variance of d( 4 ) in non-negative data. 

In the next subsection, we show that the alternative strategy make the analysis fea- 
sible when we take advantage of the marginal information. 



2.3 Improving the Estimates Using Margins 

Since we assume that a linear scan of the data is feasible and in fact the estimators in 
both strategies already take advantage of the marginal I4 norms, 5^2=1 x t an d S2=i Vi> 
we might as well compute other marginal norms and try to take advantage of them in a 
systematic manner. 

Lemma|4]demonstrates such a method for improving estimates using margins. For 
simplicity, we assume in Lemma|4]that we adopt the alternative projection strategy, in 
order to carry out the (asymptotic) analysis of the variance. 
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Lemma 4 Suppose we use the alternative projection strategy described in Section \Z2\ 
to generate samples Uij, U2,j, u^j, Vij, V2j, andv^j. We estimate du) by 



D D 



d(4),a,mie = ^2 x i + ^2vt + 602,2 - 4a 3 ,i - 4ai, 3 , 

i=l i=l 

where 02,2, 0-3.1, ^1,3, ore respectively, the solutions to the following three cubic equa- 
tions: 

a 2 1 D D 

3 "2,2 T 1 4 4 T 

«2,2 ~k~ U2V2 ~ k 2-^ X ' 2-1 Vi U2V2 

i=l i=l 

+ a 2 ,2 I "E^E 2 ^ I + \ ^xt\\V2\\ 2 + ^yi\\u 2 \\ 2 J = 0. 

\ i = l i=l / \i=l 1 = 1 / 

2 D D 

3 a 3,i r 1 6 V"^ 2 r 

«3,i — ^~ UzVl ~ ~j: X^ Xi X^y* U3Vl 

i=l j=l 

+ 03,1 f-Ex!^^ + fE^Kf+^^NH 8 ) =0. 

V i = l i=l / \i=l 1 = 1 / 



,2 1 £> £> 



3 a l,3 r 1 2 6 T 

a i,3 — — u i«3 t x i z^yi u i v 'i 



i = l i=l 



+ ai,a f - E E 2/?) + (E * 2 imi 2 + E »* n 2 ) = o- 

V i=l i=l / \i=l 1=1 / 

Asymptotically (as k — > 00), the variance would be 

Var (d( 4 ),a,m; e ) 
=36Var (0,2,2) + lQVar (a 2 ^) + IQVar (02,2) 

gg ( Ej=l x i Ej=l Vi ~ (Ei=l X i Vi 



1G 



Ei=l x i Ei=l 2/i + (^Ei=l »•/ 
Ei=l X i Ei=l Vi ~ (Ei=l X iV'' 



2 

V 1 



, 2 

= 1 2/i ! 



1G 



E£=i x l E£=i 2A 2 + (E*=i 



Ei=l Ei=l 2/f + (Ei=l 



=1 Xiyf 



2 .o lfca 
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Proof 4 M6\ 1771/ proposed taking advantage of the marginal I2 norms to improve the 
estimates 0/Z2 distances and inner products. Because we assume the alternative pro- 
jection strategy, we can analyze 0,2.2, ^3,1, and &x,3> independently and then combine 
the results; and hence we skip the detailed proof. 

Of course, in practice, we probably still prefer the basic projection strategy, i.e., 
only one projection matrix instead of three. In this case, we still solve three cubic 
equations, but the precise analysis of the variance becomes much more difficult. When 
the data are non-negative, we believe that Var (d^.a.miej w iU also be the upper bound 
of the estimation variance using the basic projection strategy, which can be easily ver- 
ified by empirical results (not included in the current report). 

Solving cubic equations is easy, as there are closed-form solutions. We can also 
solve the equations by iterative methods. In fact, it is common practice to do only a 
one-step iteration (starting with the solution without using margins), called "one-step 
Newton-Rhapson" in statistics. 



3 Normal Random Projections for P=6 

For higher p (where p is even), we can follow basically the same procedure as for 
p = 4. To illustrate this, we work out an example forp = 6. We only demonstrate the 
basic projection strategy. 

The Zg distance can be decomposed into 2 marginal norms and 5 inner products at 
various orders: 

D D D 

= E^ + E^- 20 E x3 ^ 

i— 1 i— 1 i—1 

D ODD 

+ 15 *ht + 15 E x ^ - 6 E x iV* - 6 E x ^ 

i—1 i—1 i—1 i—1 

Generate one random projection matrix R G M. Dxk , and 





D 
i=l 


D 
i=l 


D 

t=i 


U4,,j 


D 
i=l 


D 

i=l 






D 

E 

. 1 


D 
»=1 


D 

= E^ 

i=i 


Vi,3 


D 

i=l 


D 

= E^- 

i=l 
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Lemma [5]pro vide the variance of the following unbiased estimator of d(R\ : 

D D 

rf(6) = ^2 Xi + ^ Vi + T (~ 2OM3U3 + 15u\v2 + 15^2^4 — 61I5U3 — 6ltlusj 
i=l i=l 
D D k 

= 2J a:? + 2J yf + — — 20u3,ji>3,j + 15«2,j«4,j + 15«4,jW2,j — (iui.i'os.j — 6u5^wi,j. 

i=l i=l J— 1 



Lemma 5 



Var ( d( 6 ) 



y i=i i=i \i=i / / yi=i i=i \i=i 

yi=i 1=1 \i=i / / yi=i 4=1 \»=i y 

\i=l i=l \i=l / / 



where 



600 / D D D D . \ goo / c ° r D D 

= — r- I E x » E ^ + E E — jr ( E ^ E y*' + E E ^ 3 

\j=l i=i i=i i=i / \i=l i=i i=i i=i 

240 / D D D D \ 240 / D D D D 

+ ~ ( E x » E ^ 8 + E ^ f < E Xi ^ 3 ) + ~ ( E x * E + E x3 ^ E ^j^ 3 

\j=l i=l i=l i=l / \i=l i=l i=l i=l 

/ 13 13 13 D \ , Qn / D D 13 13 

i 40 " I \ " 6 \ ~* 6 . \ "> 2 2 \ ~> 4 4 I / 3 \ ~> 9 . \ ~~ 2 5 \ "> 4 

+ — \Z^ X * X^ yi + l^ Xiyi l^ Xiyi ) — jr \ 2-*> Xi 2-~> Vi t-~> Xi Vi XiVi 

\i=l i=l i=l i=l / \i = l i=l 1=1 i=l 

13 D D D \ / D D D D 

E x * E ^ 5 + E ^ f < E ) ~ ~ ( E ^ E yi + E E Xiy * 

i= 1 i—l j — 1 i— 1 / \ i— 1 i — 1 i—l i — 1 

D D D D \ 72 { ° ° ° ° \ 

E x * E y i + E ^ f < E x ^ + y ( E x%i E ^ 6 + E XiVi E a; ^ 15 

i=l i=l i = l i=l / \i=l i=l i = l i=l / 



180 

k 



Proof 5 See Appendix\C\ □. 

When all entries of x and y are non-negative, we believe it is true that Aq < 0, but 
we did not proceed with the proof. 

Of course, it is again a good idea to take advantage of the marginal norms, but we 
skip the analysis. 



4 Sub-Gaussian Random Projections 

It is well-known that it is not necessary to sample ry ~ iV(0, 1). In fact, to have an 
unbiased estimator, it suffices to sample r%j from any distribution with zero mean (and 
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unit variance). For good higher-order behaviors, it is often a good idea to sample from 
a sub-Gaussian distribution, of which a zero-mean normal distribution is a special case. 

The theory of sub-Gaussian distributions was developed in the 1950's. See (6) 
and references therein. A random variable x is sub-Gaussian if there exists a constant 
g > such that for all t e K: 



E (exp(xt)) < exp 



9 2 t 2 



In this section, we sample from a sub-Gaussian distribution with the following 
restrictions: 

E(r tf ) = 0, E(r ii ) = l, E(r*-)=*, 

and we denote j-y ~ SubG(s). It can be shown that we must restrict s > 1. 

One example would be the ry ~ Uniform(—V3 : VS), for which s = |. Al- 
though the uniform distribution is simpler than normal, it is now well-known that we 
should sample from the following three-point sub-Gaussian distributions!!). 

!1 with prob. 
with prob. 1 — i , s > 1 
— 1 with prob. — 

In our analysis, we do not have to specify the exact distribution of r.y and we can 
simply express the estimation variance as a function of s. 

Here, we consider the basic projections strategy, by generating one random projec- 
tion matrix R G W nxD with i.i.d. entries nj ~ SubG(s), and 

D D D 

1=1 2=1 2=1 

_D _D D 



i=l i=l i—1 

We again have a simple unbiased estimator of dm 

d d ^ 

d "(4),s = X ^ + X + ( 6u 2«2 - 4u^wi - 4^3) 

i=l i=l 
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Lemma 6 



Var 



36 / ° ° ( ° \ 2 D ^ 

(d(4), s ) =-r \ ^2 x t^2yt + I xfyf + (s - 3) ^2 xjyt 

\i=l i=l \i=l / s=l , 



48 



i — l i — l \i — 1 / i — l 



6 



\i=l i=l \i=l / i=l 

D D D D D 

Y. x >Y.y> + ^ x ^ + ( s - 3 ) I] 

i=l i=l i=l i=l i=l 

D D D D D 

Y x ^Y y ^ + Y Xiy * + ( s ~ 3 ) X] x * y * 



48 

\i— 1 i — l i—l i — l i—l 

32 / ° D D D D 

+ ~jr ( x * y< + X] X] + ( s - 3) x 

\i=l i = l i=l i=l 



4 4 



Proof 6 See Appendix^ □. 

5 Conclusions 

It has been an active research topic on approximating l p distances in massive high- 
dimensional data, for example, a giant "data matrix" A 6 K nxI? . While a linear scan 
on A may be feasible, it can be prohibitive (or even infeasible) to compute and store all 
pairwise l p distances. Using random projections can reduce the cost of computing all 
pairwise distances from 0(n 2 D) to (n 2 k) where k -C D. The data size is reduced from 
0(nD) to 0(nk) and hence it may be possible to store the reduced data in memory. 

While the well-known method of stable random projections is applicable to < 
p < 2, not directly to p > 2, we propose a practical approach for approximating the 
l p distances in massive data for p = 2, 4, 6, based on the simple fact that, when p 
is even, the l p distances can be decomposed into 2 marginal norms and p — 1 "inner 
products" of various orders. Two projection strategies are proposed to approximate 
these "inner products" as well as the l p distances; and we show the basic projection 
strategy (which is simpler) is always preferable over the alternative strategy in terms 
of the accuracy, at least for p = 4 in non-negative data. We also propose utilizing the 
marginal norms (which can be easily computed exactly) to further improve the esti- 
mates. Finally, we analyze the performance using sub-Gaussian random projections. 
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A Proof of Lemma 1 



D D 

d(A) = X * + ^ + ( 6?i 2«2 - 4U3«l - 4«i« 3 ) 

i— 1 i— 1 

D D / fe 



/ D \ / D \ D 

I 2 1 / 2 1 2 2 2, 2 2 

\i=l / \i=l / i=l 



Thus 



2 



3 



Similarly, we can show 

D 

E(it3,jWi,j) = y~^a;fyi, E («i,j«3,j) = 

i=l i=l 

Therefore, 

D D ^ / k \ 

E (rf(4)) =2Z :E i + Xj^i + ( Xj E ( 6tI2 'j" U2 .3 ~ 4u 3,J«l,J - 4liijD 3j -) I 

i=i t=i K \i=i V t=i «=i »=i / / 

To derive the variance, we need to analyze the expectation 

(6w 2 ,j«2,j - 4u 3tj v ltj - 4wi ii « 3iJ ) 2 

2 2 2 2 2 2 

— 48U2,jUijV2,jV3j + 32U3,jUl t jVl,jV3 t j. 

To simplify the expression, we will skip the terms that will be zeros when taking expectations. 

E (m2,3«L) = e ^ ^2 x iyi r ij + x ^ r ijyi ,r i'j^ j 

( D \ 

_P / 4 4 4 . 9 2 2 2 2 2 2. V~~^ 4 2 4 2 ) 

\i=i i^i' / 



1 4 4 . , , 2 2 2 2. 4 4 

D D / D \ 2 

= Yl x i 2/4 + 2 ( ■ 

i=l i=l \i=l / 
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Similarly 



E ( u h v L 



E ( u h v h) = 



D D / D \ 2 

i=l i=l \i=l / 

D _D / D \ 2 

x < ^ 6 + 2 ( Yl Xi Vi ■ 

i=l i=l \i=l / 



E(«2,jU3,jV2,jWl,j) 



/ D D D D \ 

\i=l i=l i=l i=l / 



D 



\ \ »=1 i^i' / I 1=1 



/ D 

_ p I 5 3 4. 5 2 3 2 

\i=i i&< , 



in I 2 2 2 3 2 i \ ^ 2 2 3 2 2 1 

D 

o V* 5 3 . 5 3 . \ "> 2 2 3 . \ " 2 3 2 

=3 2_^x i y i +2_,x i y l , + ^, x iVi x ^V^ + 2-*,XiV%x%'Vv 

i=l i^i' i^i' i^i' 

D D D D D D 

= ^ ^ + x ^ ^ + ^y* x3 ^ 2 - 

i=l i = l i = l i = l i = l i=l 

Similarly 

B(U2,jUl,jV 2: jV3,j) 

D D D D D D 

i— 1 i — 1 i— 1 i — 1 i— 1 i — 1 

E(U3,jUl,j«ljV3,j) 
D D D D D D 

= x * ^ 4 + Xi ^ 3 x3 ^ + Xiyi a; ^ 3 - 

i=l 1=1 i=l i=l i=l i=l 
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Therefore, 

Var (6u2,j«2,j — 4:U3jVij — 4ui,jVs,j) 



=36 x t + 72 ( Yl ) 

i=l i=l \i=l / 

D D / D \ 2 

+i6 x i + 32 ( X ^ ) 

i=l i=l \i=l / 

D D / D \ 2 

+ 16 + 32 PT^y? 

i=l i=l \i=l / 

/ D D D D D D 

-48 (J2^J2 y> + 1] x <^ 2 1] x i w* + 1] x » 2 w* I] x * 

\i— 1 i—1 i — 1 i—1 i—1 i — 1 

/ D D D D D D 

-48 I ^ x • i/f + x ^ x2j/ ' 2 + Xiy2 

\i=l i=l i=l i=l i=l i=l 

/ D D D D D D 

+32 1 ^2 x\ ^2 yt + ^2 Xiyf ^2 x \y* + XiVi x ^y* 

\i=l i=l i=l i=l i=l i=l 

/ D D D \ 2 

- 6 ^2 x iyf - 4 x i y* - 4 






from which it follows that 



Var 





2:2 + Xiy i 



4 4 . / 2 2 

2_^ x i z^y* + z^ x iVi 



D D / D 



D D / D 



D D / D 



) 



) 



14 



B Proof of Lemma 3 



It suffices to show that 

D D D D \ 

X x * X + X x ^ yi X x *y* I 

i — l i— 1 i— 1 i— 1 / 

D D D D 

X ^ X ^ + X X 

i— 1 i— 1 i — 1 i — 1 

D D D D 

X ^ X ^ + X X iVi X x i y i ) - 

\i— 1 i— 1 i — 1 i — 1 

We need to use the arithmetic-geometric mean inequality: 

1 / n 



n / n \ V» 

y^uij > 7i , provided u>j > 0. 

i = l \i=l / 



Because 



CD D 15 D D 

X ^ X ^ 3 + X) x * X) ^ 5 ~ X^ x * X^ ^ 4 - °- 

i = l i=l i=l i = l i=l i=l 

Thus it only remains to show that 

Z> D D D D D 

X X x *v* + X * X x ^ y i ~ X xiyi X ^ f < - °> 

i—l i — l i—l i — l i=l i — l 

for which it suffices to show that 

D D D D 

nS~^ 3/2 3/2 5/2 5/2 3 3^ n 

2 Z^ x i y i Z^ x i y i -z^ Xiyi Z^ x i y i - ' 

i — l i—l i — l i — l 

or equivalently, to show that, if Zi > Vi £ [1, -D], then 

D D D D 

f( Zi ,i = 1, 2, £>) = 2 J)*? 4 ~ X ^ X ^ °- ^ 

i — 1 i — 1 i—1 i—1 

Obviously, (HJl holds for D = 1 and D = 2. To see that it is true for D > 2, we notice that 
only at («i = 0, 22 = 0, zd = 0), the first derivative of f(zi) is zero. We can also check 
that f(zi = 1, i = 1, 2, I?) > 0. Since /(z;) is a continuous function, we know f(zi) > 
must hold if > for all i. There is no need to worry about the boundary case that Zj = and 
Zi > because it is reduced to a small problem with D' — D — 1 and we have already shown 
the base case when D = 1 and D — 2. Thus, we complete the proof. 
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C Proof of Lemma 5 



D 

3 



i—l i — l i—l 

D D D D 

i=l i=l i=l i=l 



£> 13 

A ^ — G ^ — 6 I T X X X X \ 

d(6) = 7 y Xj + 2^3/i + ^ (^-20m 3 u 3 + 15u 4 w 2 + 15m 2 «4 - 6u 5 « 3 — 6uiU 5 J 

D D fc 

= ^2 X< i + ^ + r _ 2 ° M 3,3«3,3 + 15u 2 ,jV4,j + 15«4,jV2,j - 6ltl,j«5,j — 6U5,j«l, ; 

i—l i—l j — 1 

To derive the variance, we need to analyze the expectation of 

(— 2Cht3ji>3j + 15u2 t jV4 t j + 15u 4j j«2,j — 6uijv 5 j — 6u 5t jVij) 2 
= AOOugjVgj + 225«2,j^4,j + 225uijV2j + 36uijv$j + 36uljVij 

— 600U3,j«3,j«2,j«4,i — 600U3jV3jU4,jV2,j + 240usjVsjUijV 5 j 
+ 240U3jV3jU5,jVlJ + 4:50ll2,jV4,jU4,jV2,j — 18QU2,jV4,jUljV5,j 

— 180U2,jV4,jUsjVij — 18QU4,jV2 } jUl } jVsj — l80U4 t jV2,jUs } jVlJ 
+ 72U1JV5JU5JV1J 

Skipping the detail, we can show that 

D D / D \ 

E i u h v h) = Yl x&i Z y * + 2 ( Z x i y i ) 

i = l i=l \i=l / 

D D / D \ 2 

E (uljvjj ) = Y x i Y f' ? + 2 ( Z ^ f < ) ' 

i=l i=l \t=l / 

D D / D \ 2 

E (uhvlj ) = ^ xf ^ + 2 I ^ a* ?/j J , 

i=l i=l \i=l / 

D D / D \ 2 

E i u h v h) = ^ Z + 2 ( Z ^ ) 

i=l i=l \i=l / 

D D / D \ 2 

E («5,j«i,i) = ^ f» ? + 2 ( Z ^ ) 
i=l i=l \i=l / 
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E{u 3 ,jU2,jV 3 ,jV4,j) 

D D D D D D 

\ "> 5 7 . 3 3 2 4 . 3 4 2 3 

i=l i=l i=l i=l i=l i=l 
E(u 3 ,jU4,jV 3 ,jV2,j) 

D D D D D D 

7 \ ^ 5 . 3 3 V 1 4 2 . 3 2V 4 3 
= 2^ +Z^ XiVi l^XiVi + Z^ XiVi l^ X iVi 

i=l i — 1 i — 1 i — 1 i=l i—1 

E(U3,jWl,jW3,jW5,j) 

D D D D D D 

- Y ^ ^ + Y x < f< + x3 f * Y Xi ^ 3 

1=1 1=1 i=l i=l 1=1 i=l 

E(U3,jU5,j«3jfl,j) 

D D D D D D 

-Y x ^Y y ^ + Y x *yi Y + Y 

i=i »=i »=i »=i i=i »=i 

E (ll2,jU4,jV4,jV2,j) 

D D D D D D 

i=l »=1 i=l i=l i=l i=l 

E(u2,jUl,j«4jVB,j) 

D D D D D D 

: Y x *Y y '? + Y x ^ + ^ Xiy i 

i—1 i — 1 i — 1 i — 1 i—1 i — 1 
E(u 2 ,jUB,jV4,jVl,j) 

D D D D D D 

- Y X * Y y i + Y X i y i Y X i yi + Y X i yi Y X i y i 
i=l »=1 i=l i=l »=1 i=l 

E (U4,jUl,jV2jf5,j) 

D D D D D D 

: Y x *Y y i + Y X i y i Y Xiy i + Y X i y i Y Xiy * 

i—1 i—1 i—1 i—1 i=l i—1 

E (U4,jU5jV2,jVlj) 

D D D D D D 

\ "> 9 V 1 3 . 4 2 V 1 5 . \ " 4 \ ~> 5 2 
: Z^ X * Z^ Vi + / . X i y i / , X i yi + / , x iVi / , X iVi 

i—1 i — 1 i — 1 i — 1 i — 1 i—1 

E(ui,jUB,jV 5 ,jVl,j) 

D D D D D D 

Y x * Y y i + Y Xiy i Y x i yi + Y Xiyi Y a; ' 5 ^ 5 ' 

i—1 i — 1 i — 1 i — 1 i — 1 i — 1 
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Combining the results, we obtain 



Var 




where 



kA 6 /6 = - 100 E x\ E vl + E x ^ E x 'v' 

\j — 1 i— 1 i— 1 i— 1 

/ D D D D 

- ioo I e a 'i E y * + E x * ^ E ^ ^ 

\j — 1 i— 1 i— 1 i— 1 

/ D D D D \ 

+ 40 I E ^ E ^ 8 + E ^ ^ E ) 

\i=l i=l i=l i=l / 

+ 4 o (e ^ E + E E ^0 

\i=i i=i i=i i=i / 

+ 75 (f>?f>? + f>h/?f>?!/A 

\i=l i=l i=l i=l / 

/ D D D D \ 

- 30 ( E x * E + E x » 2 f * E Xi ^ 4 ) 

\i=l i=l i=l i=l / 

/ D D D D \ 

- 30 ( E ^ E y* + E ^ f * E a; ' 5 ^ 4 

\i=l i=l i=l i=l / 

/ D D D D \ 

- 30 ( E x * E y l + E ^ f < E Xi2/ » 2 ) 

\i=l i=l i=l i=l / 

-30 (X>?X>? + X>^X>?2/) 

\i=l i=l i=l i=l / 

/ D D D D \ 

+ 12 E^E^ + E^E 3 ^ 5 
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D Proof of Lemma 6 



d(4),s = 


D D 

= x\ + yf + -■ (6u\v2 
i=i i=i 


- 4U3«i - 


D 


D x / fc 




Y^ 

i=l 




- 4u 3 ,jUi j 



E («2 j«2,j) = E I I X iyi r ij + 5Z X i r ijyi' r i'. 



_ p I 4 4 4 , 9 2 2 2 2 2 2. 424 

— ^ I / f x i Vi r ij T ^ / j ^iVi fij^i'Vi'Tit j T" / f %i Tjj Vi 

\i=l i&> i^i' 

D 

\ ^ 4 4 . , , 2 2 2 2. 4 4 

»=1 i^i' i^i' 

D D / D \ 2 D 

= X < ^ + 2 ( + ( S " 3 ) X iVi- 

i=l i=l \i=l / i=l 



Similarly, 



£> D { D \ D 

e (ui^t;?,,-) = X* Y + 2 [Y x * + - 3) 35 

i=l i=l \i=l / i=l 

D D / D \ 2 _D 

E ( u s = $3 ^ + 2 5Z Xiy » 3 + ( s _ 3 ) Y x ' 2j/ 



E (U2,jM3,j«2jW1,j) = 5Z 5Z ^ + Y 5Z ^ 

i=l i = l i=l i = l 

D D D 

+ Y Y x ^ + ( s ~ 3 ) ^ a;fi/, 3 , 
i=i i=i i=i 

D D D D 

E (u 2 ,jUijV2,jV 3 j) = x 3 i Y Vi + Y ^ 5Z 

i=l i=l i=l i=l 

D D D 

+ Y Xi ^ 2 Y x ' 2 ^ + ( s _ 3) 5Z x ^> 

i = l i = l i=l 

D D D D 

e (u 3 ,jui,jwi jf3,j) = Y x * Y + Y x ^ Y 

i=l i = l i=l i = l 

£> D D 

+ Y Xi y t Y x ^ + ( s - 3) Y x *y*- 
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Therefore, 



Var (6«2,j«2,3 —4u3jVij — 4uijv 3 j) 
=36 XX f>J + 72 f E ?) + 36 ( s " 3) E 



+ 16 E K i E + 32 E ^ + 360 - 3) 52 xlvl 

1=1 t=l \i=l / i=l 

+ !6 E x i E »? + 32 (E + 360 -3)J2 x iVi 

i=i «=i Vi=i / i=i 

/ D D D D D D D 

~ 48 ( E x « E y i + E x i f ? E + E x i Vi E x « f i + ( s - 3)52 x * y i 

\i = l 1=1 i=l i=l i = l i=l i=l 

- 48 fE **< E + E »i»f E . 2 + E **« 2 E * 2 ^ + 0-3) E *?»? 



/ D D D D D D D 

+32 ( X X i X + ^ X X i y * + X X y i + ( s - 3 ) X ^ 

\i=l i=l i=l i=l i = l i=l i=l 

/ D D D \ 2 

-(e£*?w?-4£*h« ) 

V 1=1 i=i »=i / 
from which it follows that 

36 ( ° D I ' ° \ 2 ° \ 

Var [d (4):S J =— ^ a; 4 ^ t/f + x\yl + (s - 3) ^ xf j/ 4 

\i=l i=l \i=l / i=l / 



+t(E><E> ?+ (E><H +(--3)E^ 



„6. 2 



k 



+t(E^E^+(E^) +(»-8)E 

\ i=l i=l \i=l / i=l / 

/ D D D D D 

E x i E ^ 3 + E x * yi E ^ y i + ( s ~ 3 ) E x <^ 3 

i—l i—l i — l i—l i — l 

D D D D D \ 

E x * E y * + E x<2/ « 2 E x2y3 + ( s ~ 3 ) E J 

i=l i=l i = l i—l i=l / 

D D D D D \ 

E x i E y * + E x<2/i E x * ^ + ( s - 3 ) E x <^ 4 

i=l i=l i=l i=l i=l / 



1=1 i = l \i=l / 1=1 

D D / D \ 2 £> 



2 6 



48 
" k 

48 
" k 
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Abstract 

ManjQ applications in machine learning and data mining require computing 
pairwise l p distances in a data matrix A 6 R 71 * 23 . For massive high-dimensional 
data, computing all pairwise distances of A can be infeasible. In fact, even storing 
A or all pairwise distances of A in the memory may be also infeasible. 

For < p < 2, efficient small space algorithms exist, for example, based 
on the method of stable random projections, which unfortunately is not directly 
applicable to p — 3, 4, 5, 6, ... This paper proposes a simple method for p = 2, 4, 
6, ... We first decompose the l p (where p is even) distances into a sum of 2 marginal 
norms and p — 1 "inner products" at different orders. Then we apply normal or 
sub-Gaussian random projections to approximate the resultant "inner products," 
assuming that the marginal norms can be computed exactly by a linear scan. 

We propose two strategies for applying random projections. The basic projec- 
tion strategy requires only one projection matrix but it is more difficult to analyze, 
while the alternative projection strategy requires p — 1 projection matrices but its 
theoretical analysis is much easier. In terms of the accuracy, at least for p = 4, the 
basic strategy is always more accurate than the alternative strategy if the data are 
non-negative, which is common in reality. 

1 Introduction 

This study proposes a simple method for efficiently computing the l p distances in a 
massive data matrix A £ M. nxD for p > 2 (where p is even), using random projections^ 
While many previous work on random projections focused on approximating the 1% 
distances (and inner products), the method of symmetric stable random projections^!, 
?, ?, ?] is applicable to approximating the l p distances for all < p < 2. This work 
proposes using random projections for p > 2, a least for some special cases. 

'First draft Dec. 2007. Slightly revised June 2008. 
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Machine learning algorithms often operate on the l p distances of A instead of the 
original data. A straightforward application would be searching for the nearest neigh- 
bors using l p distance. The l p distance is also a basic loss functions for quality mea- 
sure. The widely used "kernel trick," (e.g., for support vector machines (SVM)), is 
often constructed on top of the l p distances [?]0 

Here we can treat pas a tuning parameter. It is common to take p = 2 (Euclidian 
distance), or p = oo (infinity distance), p ~ 1 (Manhattan distance), or p = (Ham- 
ming distance); but in principle any p values are possible. In fact, if there is an efficient 
mechanism to compute the l p distances, then it becomes affordable to tune learning 
algorithms for many values of p for the best performance. 

In modern data mining and learning applications, the ubiquitous phenomenon of 
"massive data" imposes challenges. For example, pre-computing and storing all pair- 
wise l p distances in memory at the cost 0(n 2 ) can be infeasible when n > 10 6 (or 
even just 10 )[?]. For ultra high-dimensional data, even just storing the whole data 
matrix can be infeasible. In the meanwhile, modern applications can routinely involve 
millions of observations; and developing scalable learning and data mining algorithms 
has been an active research direction. One commonly used strategy in current practice 
is to compute the distances on the fly[?], in stead of storing all pairwise l p distances. 

Data reduction algorithms such as sampling or sketching methods are also popular. 
While there have been extensive studies on approximating the l p distances for < p < 
2, p > 2 can be useful too. For example, because the normal distribution is completely 
determined by its first two moments (mean and variance), we can identify the non- 
normal components of the data by analyzing higher moments, in particular, the fourth 
moments (i.e., kurtosis). Thus, the fourth moments are critical, for example, in the 
field of Independent Component Analysis (ICA)[t]. Therefore, it is viable to use the l p 
distance for p > 2 when lower order distances can not efficiently differentiate data. 

It is unfortunate that the family of stable distributions^!] is limited to < p < 2 and 
hence we can not directly using stable distributions for approximating the l p distances. 
In the theoretical CS community, there have been many studies on approximating the 
l p norms and distances[?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?], some of which also applicable 
to the l p distances (e.g., comparing two long vectors). Those papers proved that small 
space (O(l)) algorithms exist only for < p < 2. 

1.1 The Methodology 

Given a giant data matrix A 6 R nxD , we assume that a linear scan of the data is 
feasible, but computing all pairwise interactions is not, either due to computational 
budget constraints or memory limits. Also, we only consider even p = 4, 6, among 
which p = 4 is probably the most important. 

Interestingly, our method is based only on normal (or normal-like) projections. The 
observation is that, when p is even, the l p distance can be decomposed into marginal 

2 It is well-known that the radial basis kernel using the l v distance with < p < 2 satisfies the Mercer's 
condition. However, we can still use the l v distance with p > 2 as kernels, although in this case it is not 
guaranteed to find the "most optimal" solution. For very large-scale learning, we usually will not find the 
"most optimal" solution any way. 
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Ip norms and "inner products" of various orders. For example, for two £>-dimensional 
vectors x and y, when p = 4, then 

d {P) = & - v* \ p = E < + E y< + 6 E ^ - 4 E ^ - 4 E 

i— 1 i— 1 i— 1 i— 1 i— 1 i— 1 

Since we assume that a linear scan of the data is feasible, we can compute ^,f =1 xj and 
12iLi Vi exactly. We can approximate the interaction terms Yli=i x \v1^ Si=i x \vu 
and J^iLi x iVi usm g normal (or normal-like) random projections. Therefore, for p 
being even, we are able to efficiently approximate the l p distances. 

1.2 Paper Organization 

Section |2 concerns using normal random projections for approximating I4 distances. 
We introduce two projection strategies and the concept of utilizing the marginal norms 
to improve the estimates. Section [3] extends this approach to approximating Iq dis- 
tances. Section [4] analyzes the effect of replacing normal projections by sub-Gaussian 
projections. 

2 Normal Random Projections for p = 4 

The goal is to efficiently compute all pairwise l p (p = 4) distances in A £ K"*- . It 
suffices to consider any two rows of A, say x and y, where x, y £ M. D . We need to 
estimate the l p distance between x and y 

D 

d (p) = E \ x * ~ y ^ p - 

i=l 

which, when p = 4, becomes 

D D D D D D 

d w = E \ x * - v* i 4 = E x i + E + 6 E x2 *£ - 4 E x *y* - 4 E x ^ ■ 

i—1 i—1 i—1 i—1 i—1 i—1 

In one pass, we can compute YliLi x t anc ^ SiLi Vi easily, but computing the inter- 
actions is more difficult. We resort to random projections for approximating X}£=i x iUi> 
Y^fLi x iUi' an d Si=i x iVi- Since there are three "inner products" of different orders, 
we can choose either only one projection matrix for all three terms (the basic projection 
strategy), or three independent projection matrices (the alternative projection strategy). 
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2.1 The Basic Projection Strategy 

First, generate a random matrix R e M. Dxk (k <C D), with i.i.d. entrie^l from a 
standard normal, i.e., 

rij ~N(0,l), E( rij )=0, E(r?.)=l, E(r|)=3. 
E (rf^r*^-/) =0, if £ or s is odd, and i ^ i' or j ^ f 

Using random projections, we generate six vectors in k dimensions, u\, u-x, U3, V\, 
v 2 , v 3 G R fc : 

D D D 

i—1 i—1 i—1 

D D D 

i—1 i—1 i—1 

We have a simple unbiased estimator of du\ 

d d ^ 

d(4) = E ^ + E y ^ + ^ ( Qu l v 2 - 4u3«i - 4u[v 3 ) . 

i=l i=l 

Lemma 1 

36 f D ° ( D ^ 

Var (d W ) =- [J2 < E Vi + E ^ 

y j=i j=i \j=i y 

\i=l i=l \i=l / I 

V i=l i=l \i=l / / 

/ D D D D \ 

A4 = -t E^E^+E^E^ 2 
\t=i (=1 1=1 1=1 j 

/ D D D D \ 

-~k E E ^ 5 + E ^f? E ^ »i 

\i=i i=i i=i t=i / 

32 / D D D D \ 

\i=l i=l i=l i=l ) 



3 It is possible to relax the requirement of i.i.d samples. In fact, to prove unbiasedness of the estimates 
only needs pairwise independence, and to derive the variance formula requires four-wise independence. 
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Proof 1 See Appendix^ □ 



The basic projection strategy is simple but its analysis is quite involved, especially 
when p > 4. Also, if we are interested in higher order moments (other than variance) 
of the estimator, the analysis becomes very tedious. 

2.2 The Alternative Projection Strategy 

Instead of one projection matrix R, we generate three, R( a \ R^ b \ R( c \ independently. 
By random projections, we generate six vectors in k dimensions, ui, U2, v.3, v\, i>2, 
v 3 e K fc , such that 

D D D 

E(c) 2 (a) 3 (b) 

i—1 i—1 i — 1 

D D D 

E(b) ST^ 2 (a) 3 (c) 

i—1 i—1 i—1 

Here we abuse the notation slightly by using the same u and v for both projection 
strategies. 

Again, we have an unbiased estimator, denoted by d^ a 

D D - 

d (4),a = E X i + E y * + I ( 6u 2«2 - 4 U Jvi - &u[v 3 ) 



i=l i=l 



Lemma 2 



36 ( ° D ( ° 

Var (d (4) , a J =— ( Y A J2vt + ( E x *y* 

i__ 1 i—i \ i—i 

E^E^+IE 



16 

T 



16 

T 



i—1 i—1 \z=l 



Proof 2 The proof basically follows from that of Lemma\l\ 
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Compared with Var (d(4)^ in LemmaQ] the difference would be A4 

Var (d^J - Var (d (4 ),a) = A 4 



4 

D 

■", .7; 



i=l 



' D D D D \ 

E x ? E f «■ + E E ^ I 

i—1 i—1 i—1 / 



•f E^'E^-E^E^; 1 • a) 

\i=i ^=1 i=\ i=i j 

which can be either negative or positive. For example, when all Xi's are negative and 
all yi's are positive, then A4 > 0, i.e., the alternative projections strategy results in 
smaller variance and hence it should be adopted. 

We can show in Lemma [3] that when the data are non-negative (which is more 
likely the reality), the difference in ([]]) will never exceed zero, suggesting that the 
basic strategy would be preferable, which is also operationally simpler (although more 
sophisticated in the analysis). 

Lemma 3 If all entries of x and y are non-negative, then 

Var (d (4) ) - Var (d (4 ),a) = A 4 < 0. (2) 
Proof 3 See Appendix\E[ □. 

Thus, the main advantage of the alternative projection strategy is that it simplifies 
the analysis, especially true when p > 4. Also, analyzing the alternative projection 
strategy may provide an estimate for the basic projection strategy. For example, the 
variance of d^ a is an upper bound of the variance of d( 4 ) in non-negative data. 

In the next subsection, we show that the alternative strategy make the analysis fea- 
sible when we take advantage of the marginal information. 



2.3 Improving the Estimates Using Margins 

Since we assume that a linear scan of the data is feasible and in fact the estimators in 
both strategies already take advantage of the marginal I4 norms, 5^2=1 x t an d S2=i Vi> 
we might as well compute other marginal norms and try to take advantage of them in a 
systematic manner. 

Lemma|4]demonstrates such a method for improving estimates using margins. For 
simplicity, we assume in Lemma|4]that we adopt the alternative projection strategy, in 
order to carry out the (asymptotic) analysis of the variance. 
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Lemma 4 Suppose we use the alternative projection strategy described in Section \Z2\ 
to generate samples Uij, U2,j, u^j, Vij, V2j, andv^j. We estimate du) by 



D D 



d(4),a,mie = ^2 x i + ^2vt + 602,2 - 4a 3 ,i - 4ai, 3 , 

i=l i=l 

where 02,2, 0-3.1, ^1,3, ore respectively, the solutions to the following three cubic equa- 
tions: 

a 2 1 D D 

3 "2,2 T 1 4 4 T 

«2,2 ~k~ U2V2 ~ k 2-^ X ' 2-1 Vi U2V2 

i=l i=l 

+ a 2 ,2 I "E^E 2 ^ I + \ ^xt\\V2\\ 2 + ^yi\\u 2 \\ 2 J = 0. 

\ i = l i=l / \i=l 1 = 1 / 

2 D D 

3 a 3,i r 1 6 V"^ 2 r 

«3,i — ^~ UzVl ~ ~j: X^ Xi X^y* U3Vl 

i=l j=l 

+ 03,1 f-Ex!^^ + fE^Kf+^^NH 8 ) =0. 

V i = l i=l / \i=l 1 = 1 / 



,2 1 £> £> 



3 a l,3 r 1 2 6 T 

a i,3 — — u i«3 t x i z^yi u i v 'i 



i = l i=l 



+ ai,a f - E E 2/?) + (E * 2 imi 2 + E »* n 2 ) = o- 

V i=l i=l / \i=l 1=1 / 

Asymptotically (as k — > 00), the variance would be 

Var (d( 4 ),a,m; e ) 
=36Var (0,2,2) + lQVar (a 2 ^) + IQVar (02,2) 

gg ( Ej=l x i Ej=l Vi ~ (Ei=l X i Vi 



1G 



Ei=l x i Ei=l 2/i + (^Ei=l »•/ 
Ei=l X i Ei=l Vi ~ (Ei=l X iV'' 



2 

V 1 



, 2 

= 1 2/i ! 



1G 



E£=i x l E£=i 2A 2 + (E*=i 



Ei=l Ei=l 2/f + (Ei=l 



=1 Xiyf 



2 .o lfca 
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Proof 4 [?, ?7 proposed taking advantage of the marginal I2 norms to improve the 
estimates of I2 distances and inner products. Because we assume the alternative pro- 
jection strategy, we can analyze 02,2, ^3,1, and 61,3, independently and then combine 
the results; and hence we skip the detailed proof. 

Of course, in practice, we probably still prefer the basic projection strategy, i.e., 
only one projection matrix instead of three. In this case, we still solve three cubic 
equations, but the precise analysis of the variance becomes much more difficult. When 

the data are non-negative, we believe that Var ^d(4) jajm ; e ^ will also be the upper bound 
of the estimation variance using the basic projection strategy, which can be easily ver- 
ified by empirical results (not included in the current report). 

Solving cubic equations is easy, as there are closed-form solutions. We can also 
solve the equations by iterative methods. In fact, it is common practice to do only a 
one-step iteration (starting with the solution without using margins), called "one-step 
Newton-Rhapson" in statistics. 



3 Normal Random Projections for P=6 

For higher p (where p is even), we can follow basically the same procedure as for 
p = 4. To illustrate this, we work out an example for p = 6. We only demonstrate the 
basic projection strategy. 

The Iq distance can be decomposed into 2 marginal norms and 5 inner products at 
various orders: 

D D D 

,,3 



d (6) = E^+E^- 20 E^ 3 

i=l i=l i=l 

+ 15 £ xlyt + 15 £ x\y\ - 6 5>? yi - 6 5> <y ? 



i=l i=l i=l i=l 

Generate one random projection matrix R E M. Dxk , and 

D D D 



3 r 



i=l i=l i=l 

i=l i=l 
_D _D _D 

= E yi r ij> v *j = E y1 r io^,j = E y3r ^ 

i— 1 i— 1 i— 1 

= E yi r ij> v 5,j = E y* ? r « • 



i=i i=i 
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Lemma [5]pro vide the variance of the following unbiased estimator of d(R\ : 

D D 

rf(6) = ^2 Xi + ^ Vi + T (~ 2OM3U3 + 15u\v2 + 15^2^4 — 61I5U3 — 6ltlusj 
i=l i=l 
D D k 

= 2J a:? + 2J yf + — — 20u3,ji>3,j + 15«2,j«4,j + 15«4,jW2,j — (iui.i'os.j — 6u5^wi,j. 

i=l i=l J— 1 



Lemma 5 



Var ( d( 6 ) 



y i=i i=i \i=i / / yi=i i=i \i=i 

yi=i 1=1 \i=i / / yi=i 4=1 \»=i y 

\i=l i=l \i=l / / 



where 



600 / D D D D . \ goo / c ° r D D 

= — r- I E x » E ^ + E E — jr ( E ^ E y*' + E E ^ 3 

\j=l i=i i=i i=i / \i=l i=i i=i i=i 

240 / D D D D \ 240 / D D D D 

+ ~ ( E x » E ^ 8 + E ^ f < E Xi ^ 3 ) + ~ ( E x * E + E x3 ^ E ^j^ 3 

\j=l i=l i=l i=l / \i=l i=l i=l i=l 

/ 13 13 13 D \ , Qn / D D 13 13 

i 40 " I \ " 6 \ ~* 6 . \ "> 2 2 \ ~> 4 4 I / 3 \ ~> 9 . \ ~~ 2 5 \ "> 4 

+ — \Z^ X * X^ yi + l^ Xiyi l^ Xiyi ) — jr \ 2-*> Xi 2-~> Vi t-~> Xi Vi XiVi 

\i=l i=l i=l i=l / \i = l i=l 1=1 i=l 

13 D D D \ / D D D D 

E x * E ^ 5 + E ^ f < E ) ~ ~ ( E ^ E yi + E E Xiy * 

i= 1 i—l j — 1 i— 1 / \ i— 1 i — 1 i—l i — 1 

D D D D \ 72 { ° ° ° ° \ 

E x * E y i + E ^ f < E x ^ + y ( E x%i E ^ 6 + E XiVi E a; ^ 15 

i=l i=l i = l i=l / \i=l i=l i = l i=l / 



180 

k 



Proof 5 See Appendix\C\ □. 

When all entries of x and y are non-negative, we believe it is true that Aq < 0, but 
we did not proceed with the proof. 

Of course, it is again a good idea to take advantage of the marginal norms, but we 
skip the analysis. 



4 Sub-Gaussian Random Projections 

It is well-known that it is not necessary to sample ry ~ iV(0, 1). In fact, to have an 
unbiased estimator, it suffices to sample r%j from any distribution with zero mean (and 
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unit variance). For good higher-order behaviors, it is often a good idea to sample from 
a sub-Gaussian distribution, of which a zero-mean normal distribution is a special case. 

The theory of sub-Gaussian distributions was developed in the 1950's. See [?] 
and references therein. A random variable x is sub-Gaussian if there exists a constant 
g > such that for all iel: 



E (exp(xt)) < exp 



9 t 



In this section, we sample from a sub-Gaussian distribution with the following 
restrictions: 

E(r«) = 0, E(r y ) = l, E(r&) = a, 

and we denote m ~ SubG(s). It can be shown that we must restrict s > 1. 

One example would be the r»j ~ Uniform(—V3, for which s = |. Al- 
though the uniform distribution is simpler than normal, it is now well-known that we 
should sample from the following three-point sub-Gaussian distributions [?]. 

!1 withprob. 
with prob. 1 — ^ , s > 1 
— 1 withprob. ^ 

In our analysis, we do not have to specify the exact distribution of r,j and we can 
simply express the estimation variance as a function of s. 

Here, we consider the basic projections strategy, by generating one random projec- 
tion matrix R e R nxD with i.i.d. entries ~ SubG(s), and 

D D D 



Uij — XiTij , U2.j = , U3j- = x i rij , 

i=l i=l i— 1 

_D _D D 



i—1 i—1 i—1 

We again have a simple unbiased estimator of 
d d ^ 



10 



Lemma 6 



Var 



36 / ° ° ( ° \ 2 D ^ 

(d(4), s ) =-r \ ^2 x t^2yt + I xfyf + (s - 3) ^2 xjyt 

\i=l i=l \i=l / s=l , 



48 



i — l i — l \i — 1 / i — l 
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\i=l i=l \i=l / i=l 

D D D D D 

Y. x >Y.y> + ^ x ^ + ( s - 3 ) I] 

i=l i=l i=l i=l i=l 

D D D D D 

Y x ^Y y ^ + Y Xiy * + ( s ~ 3 ) X] x * y * 



48 

\i— 1 i — l i—l i — l i—l 

32 / ° D D D D 

+ ~jr ( x * y< + X] X] + ( s - 3) x 

\i=l i = l i=l i=l 



4 4 



Proof 6 See Appendix^ □. 

5 Conclusions 

It has been an active research topic on approximating l p distances in massive high- 
dimensional data, for example, a giant "data matrix" A 6 K nxI? . While a linear scan 
on A may be feasible, it can be prohibitive (or even infeasible) to compute and store all 
pairwise l p distances. Using random projections can reduce the cost of computing all 
pairwise distances from 0(n 2 D) to (n 2 k) where k -C D. The data size is reduced from 
0(nD) to 0(nk) and hence it may be possible to store the reduced data in memory. 

While the well-known method of stable random projections is applicable to < 
p < 2, not directly to p > 2, we propose a practical approach for approximating the 
l p distances in massive data for p = 2, 4, 6, based on the simple fact that, when p 
is even, the l p distances can be decomposed into 2 marginal norms and p — 1 "inner 
products" of various orders. Two projection strategies are proposed to approximate 
these "inner products" as well as the l p distances; and we show the basic projection 
strategy (which is simpler) is always preferable over the alternative strategy in terms 
of the accuracy, at least for p = 4 in non-negative data. We also propose utilizing the 
marginal norms (which can be easily computed exactly) to further improve the esti- 
mates. Finally, we analyze the performance using sub-Gaussian random projections. 
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A Proof of Lemma 1 



D D 

d(A) = X * + ^ + ( 6?i 2«2 - 4U3«l - 4«i« 3 ) 

i— 1 i— 1 

D D / fe 



/ D \ / D \ D 

I 2 1 / 2 1 2 2 2, 2 2 

\i=l / \i=l / i=l 



Thus 



2 



3 



Similarly, we can show 

D 

E(it3,jWi,j) = y~^a;fyi, E («i,j«3,j) = 

i=l i=l 

Therefore, 

D D ^ / k \ 

E (rf(4)) =2Z :E i + Xj^i + ( Xj E ( 6tI2 'j" U2 .3 ~ 4u 3,J«l,J - 4liijD 3j -) I 

i=i t=i K \i=i V t=i «=i »=i / / 

To derive the variance, we need to analyze the expectation 

(6w 2 ,j«2,j - 4u 3tj v ltj - 4wi ii « 3iJ ) 2 

2 2 2 2 2 2 

— 48U2,jUijV2,jV3j + 32U3,jUl t jVl,jV3 t j. 

To simplify the expression, we will skip the terms that will be zeros when taking expectations. 

E (m2,3«L) = e ^ ^2 x iyi r ij + x ^ r ijyi ,r i'j^ j 

( D \ 

_P / 4 4 4 . 9 2 2 2 2 2 2. V~~^ 4 2 4 2 ) 

\i=i i^i' / 



1 4 4 . , , 2 2 2 2. 4 4 

D D / D \ 2 

= Yl x i 2/4 + 2 ( ■ 

i=l i=l \i=l / 
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Similarly 



E ( u h v L 



E ( u h v h) = 



D D / D \ 2 

i=l i=l \i=l / 

D _D / D \ 2 

x < ^ 6 + 2 ( Yl Xi Vi ■ 

i=l i=l \i=l / 



E(«2,jU3,jV2,jWl,j) 



/ D D D D \ 

\i=l i=l i=l i=l / 



D 



\ \ »=1 i^i' / I 1=1 



/ D 

_ p I 5 3 4. 5 2 3 2 

\i=i i&< , 



in I 2 2 2 3 2 i \ ^ 2 2 3 2 2 1 

D 

o V* 5 3 . 5 3 . \ "> 2 2 3 . \ " 2 3 2 

=3 2_^x i y i +2_,x i y l , + ^, x iVi x ^V^ + 2-*,XiV%x%'Vv 

i=l i^i' i^i' i^i' 

D D D D D D 

= ^ ^ + x ^ ^ + ^y* x3 ^ 2 - 

i=l i = l i = l i = l i = l i=l 

Similarly 

B(U2,jUl,jV 2: jV3,j) 

D D D D D D 

i— 1 i — 1 i— 1 i — 1 i— 1 i — 1 

E(U3,jUl,j«ljV3,j) 
D D D D D D 

= x * ^ 4 + Xi ^ 3 x3 ^ + Xiyi a; ^ 3 - 

i=l 1=1 i=l i=l i=l i=l 
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Therefore, 

Var (6u2,j«2,j — 4:U3jVij — 4ui,jVs,j) 



=36 x t + 72 ( Yl ) 

i=l i=l \i=l / 

D D / D \ 2 

+i6 x i + 32 ( X ^ ) 

i=l i=l \i=l / 

D D / D \ 2 

+ 16 + 32 PT^y? 

i=l i=l \i=l / 

/ D D D D D D 

-48 (J2^J2 y> + 1] x <^ 2 1] x i w* + 1] x » 2 w* I] x * 

\i— 1 i—1 i — 1 i—1 i—1 i — 1 

/ D D D D D D 

-48 I ^ x • i/f + x ^ x2j/ ' 2 + Xiy2 

\i=l i=l i=l i=l i=l i=l 

/ D D D D D D 

+32 1 ^2 x\ ^2 yt + ^2 Xiyf ^2 x \y* + XiVi x ^y* 

\i=l i=l i=l i=l i=l i=l 

/ D D D \ 2 

- 6 ^2 x iyf - 4 x i y* - 4 






from which it follows that 



Var 





2:2 + Xiy i 



4 4 . / 2 2 

2_^ x i z^y* + z^ x iVi 



D D / D 



D D / D 



D D / D 



) 



) 
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B Proof of Lemma 3 



It suffices to show that 

D D D D \ 

X x * X + X x ^ yi X x *y* I 

i — l i— 1 i— 1 i— 1 / 

D D D D 

X ^ X ^ + X X 

i— 1 i— 1 i — 1 i — 1 

D D D D 

X ^ X ^ + X X iVi X x i y i ) - 

\i— 1 i— 1 i — 1 i — 1 

We need to use the arithmetic-geometric mean inequality: 

1 / n 



n / n \ V» 

y^uij > 7i , provided u>j > 0. 

i = l \i=l / 



Because 



CD D 15 D D 

X ^ X ^ 3 + X) x * X) ^ 5 ~ X^ x * X^ ^ 4 - °- 

i = l i=l i=l i = l i=l i=l 

Thus it only remains to show that 

Z> D D D D D 

X X x *v* + X * X x ^ y i ~ X xiyi X ^ f < - °> 

i—l i — l i—l i — l i=l i — l 

for which it suffices to show that 

D D D D 

nS~^ 3/2 3/2 5/2 5/2 3 3^ n 

2 Z^ x i y i Z^ x i y i -z^ Xiyi Z^ x i y i - ' 

i — l i—l i — l i — l 

or equivalently, to show that, if Zi > Vi £ [1, -D], then 

D D D D 

f( Zi ,i = 1, 2, £>) = 2 J)*? 4 ~ X ^ X ^ °- ^ 

i — 1 i — 1 i—1 i—1 

Obviously, (HJl holds for D = 1 and D = 2. To see that it is true for D > 2, we notice that 
only at («i = 0, 22 = 0, zd = 0), the first derivative of f(zi) is zero. We can also check 
that f(zi = 1, i = 1, 2, I?) > 0. Since /(z;) is a continuous function, we know f(zi) > 
must hold if > for all i. There is no need to worry about the boundary case that Zj = and 
Zi > because it is reduced to a small problem with D' — D — 1 and we have already shown 
the base case when D = 1 and D — 2. Thus, we complete the proof. 
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C Proof of Lemma 5 



D 

3 



i—l i — l i—l 

D D D D 

i=l i=l i=l i=l 



£> 13 

A ^ — G ^ — 6 I T X X X X \ 

d(6) = 7 y Xj + 2^3/i + ^ (^-20m 3 u 3 + 15u 4 w 2 + 15m 2 «4 - 6u 5 « 3 — 6uiU 5 J 

D D fc 

= ^2 X< i + ^ + r _ 2 ° M 3,3«3,3 + 15u 2 ,jV4,j + 15«4,jV2,j - 6ltl,j«5,j — 6U5,j«l, ; 

i—l i—l j — 1 

To derive the variance, we need to analyze the expectation of 

(— 2Cht3ji>3j + 15u2 t jV4 t j + 15u 4j j«2,j — 6uijv 5 j — 6u 5t jVij) 2 
= AOOugjVgj + 225«2,j^4,j + 225uijV2j + 36uijv$j + 36uljVij 

— 600U3,j«3,j«2,j«4,i — 600U3jV3jU4,jV2,j + 240usjVsjUijV 5 j 
+ 240U3jV3jU5,jVlJ + 4:50ll2,jV4,jU4,jV2,j — 18QU2,jV4,jUljV5,j 

— 180U2,jV4,jUsjVij — 18QU4,jV2 } jUl } jVsj — l80U4 t jV2,jUs } jVlJ 
+ 72U1JV5JU5JV1J 

Skipping the detail, we can show that 

D D / D \ 

E i u h v h) = Yl x&i Z y * + 2 ( Z x i y i ) 

i = l i=l \i=l / 

D D / D \ 2 

E (uljvjj ) = Y x i Y f' ? + 2 ( Z ^ f < ) ' 

i=l i=l \t=l / 

D D / D \ 2 

E (uhvlj ) = ^ xf ^ + 2 I ^ a* ?/j J , 

i=l i=l \i=l / 

D D / D \ 2 

E i u h v h) = ^ Z + 2 ( Z ^ ) 

i=l i=l \i=l / 

D D / D \ 2 

E («5,j«i,i) = ^ f» ? + 2 ( Z ^ ) 
i=l i=l \i=l / 
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E{u 3 ,jU2,jV 3 ,jV4,j) 

D D D D D D 

\ "> 5 7 . 3 3 2 4 . 3 4 2 3 

i=l i=l i=l i=l i=l i=l 
E(u 3 ,jU4,jV 3 ,jV2,j) 

D D D D D D 

7 \ ^ 5 . 3 3 V 1 4 2 . 3 2V 4 3 
= 2^ +Z^ XiVi l^XiVi + Z^ XiVi l^ X iVi 

i=l i — 1 i — 1 i — 1 i=l i—1 

E(U3,jWl,jW3,jW5,j) 

D D D D D D 

- Y ^ ^ + Y x < f< + x3 f * Y Xi ^ 3 

1=1 1=1 i=l i=l 1=1 i=l 

E(U3,jU5,j«3jfl,j) 

D D D D D D 

-Y x ^Y y ^ + Y x *yi Y + Y 

i=i »=i »=i »=i i=i »=i 

E (ll2,jU4,jV4,jV2,j) 

D D D D D D 

i=l »=1 i=l i=l i=l i=l 

E(u2,jUl,j«4jVB,j) 

D D D D D D 

: Y x *Y y '? + Y x ^ + ^ Xiy i 

i—1 i — 1 i — 1 i — 1 i—1 i — 1 
E(u 2 ,jUB,jV4,jVl,j) 

D D D D D D 

- Y X * Y y i + Y X i y i Y X i yi + Y X i yi Y X i y i 
i=l »=1 i=l i=l »=1 i=l 

E (U4,jUl,jV2jf5,j) 

D D D D D D 

: Y x *Y y i + Y X i y i Y Xiy i + Y X i y i Y Xiy * 

i—1 i—1 i—1 i—1 i=l i—1 

E (U4,jU5jV2,jVlj) 

D D D D D D 

\ "> 9 V 1 3 . 4 2 V 1 5 . \ " 4 \ ~> 5 2 
: Z^ X * Z^ Vi + / . X i y i / , X i yi + / , x iVi / , X iVi 

i—1 i — 1 i — 1 i — 1 i — 1 i—1 

E(ui,jUB,jV 5 ,jVl,j) 

D D D D D D 

Y x * Y y i + Y Xiy i Y x i yi + Y Xiyi Y a; ' 5 ^ 5 ' 

i—1 i — 1 i — 1 i — 1 i — 1 i — 1 
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Combining the results, we obtain 



Var 




where 



kA 6 /6 = - 100 E x\ E vl + E x ^ E x 'v' 

\j — 1 i— 1 i— 1 i— 1 

/ D D D D 

- ioo I e a 'i E y * + E x * ^ E ^ ^ 

\j — 1 i— 1 i— 1 i— 1 

/ D D D D \ 

+ 40 I E ^ E ^ 8 + E ^ ^ E ) 

\i=l i=l i=l i=l / 

+ 4 o (e ^ E + E E ^0 

\i=i i=i i=i i=i / 

+ 75 (f>?f>? + f>h/?f>?!/A 

\i=l i=l i=l i=l / 

/ D D D D \ 

- 30 ( E x * E + E x » 2 f * E Xi ^ 4 ) 

\i=l i=l i=l i=l / 

/ D D D D \ 

- 30 ( E ^ E y* + E ^ f * E a; ' 5 ^ 4 

\i=l i=l i=l i=l / 

/ D D D D \ 

- 30 ( E x * E y l + E ^ f < E Xi2/ » 2 ) 

\i=l i=l i=l i=l / 

-30 (X>?X>? + X>^X>?2/) 

\i=l i=l i=l i=l / 

/ D D D D \ 

+ 12 E^E^ + E^E 3 ^ 5 
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D Proof of Lemma 6 



d(4),s = 


D D 

= x\ + yf + -■ (6u\v2 
i=i i=i 


- 4U3«i - 


D 


D x / fc 




Y^ 

i=l 




- 4u 3 ,jUi j 



E («2 j«2,j) = E I I X iyi r ij + 5Z X i r ijyi' r i'. 



_ p I 4 4 4 , 9 2 2 2 2 2 2. 424 

— ^ I / f x i Vi r ij T ^ / j ^iVi fij^i'Vi'Tit j T" / f %i Tjj Vi 

\i=l i&> i^i' 

D 

\ ^ 4 4 . , , 2 2 2 2. 4 4 

»=1 i^i' i^i' 

D D / D \ 2 D 

= X < ^ + 2 ( + ( S " 3 ) X iVi- 

i=l i=l \i=l / i=l 



Similarly, 



£> D { D \ D 

e (ui^t;?,,-) = X* Y + 2 [Y x * + - 3) 35 

i=l i=l \i=l / i=l 

D D / D \ 2 _D 

E ( u s = $3 ^ + 2 5Z Xiy » 3 + ( s _ 3 ) Y x ' 2j/ 



E (U2,jM3,j«2jW1,j) = 5Z 5Z ^ + Y 5Z ^ 

i=l i = l i=l i = l 

D D D 

+ Y Y x ^ + ( s ~ 3 ) ^ a;fi/, 3 , 
i=i i=i i=i 

D D D D 

E (u 2 ,jUijV2,jV 3 j) = x 3 i Y Vi + Y ^ 5Z 

i=l i=l i=l i=l 

D D D 

+ Y Xi ^ 2 Y x ' 2 ^ + ( s _ 3) 5Z x ^> 

i = l i = l i=l 

D D D D 

e (u 3 ,jui,jwi jf3,j) = Y x * Y + Y x ^ Y 

i=l i = l i=l i = l 

£> D D 

+ Y Xi y t Y x ^ + ( s - 3) Y x *y*- 
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Therefore, 



Var (6«2,j«2,3 —4u3jVij — 4uijv 3 j) 
=36 XX f>J + 72 f E ?) + 36 ( s " 3) E 



+ 16 E K i E + 32 E ^ + 360 - 3) 52 xlvl 

1=1 t=l \i=l / i=l 

+ !6 E x i E »? + 32 (E + 360 -3)J2 x iVi 

i=i «=i Vi=i / i=i 

/ D D D D D D D 

~ 48 ( E x « E y i + E x i f ? E + E x i Vi E x « f i + ( s - 3)52 x * y i 

\i = l 1=1 i=l i=l i = l i=l i=l 

- 48 fE **< E + E »i»f E . 2 + E **« 2 E * 2 ^ + 0-3) E *?»? 



/ D D D D D D D 

+32 ( X X i X + ^ X X i y * + X X y i + ( s - 3 ) X ^ 

\i=l i=l i=l i=l i = l i=l i=l 

/ D D D \ 2 

-(e£*?w?-4£*h« ) 

V 1=1 i=i »=i / 
from which it follows that 

36 ( ° D I ' ° \ 2 ° \ 

Var [d (4):S J =— ^ a; 4 ^ t/f + x\yl + (s - 3) ^ xf j/ 4 

\i=l i=l \i=l / i=l / 



+t(E^E^ 2 +(E*M +(--3)E^ 



„6. 2 



k 



E x » E 2/3 + E ^ E ^ + ( s _ 3 ) E a; ' 5 ^ 3 

Vi— 1 i—l i — l i—l i — l 

/ D D D D D 

E x3 E ^ 5 + E x<2/2 E x2y3 + ( s - 3 ) £ x ^ 



i— 1 i — l \ i— 1 / i— 1 

D D / D \ 2 D 



2 6 



48 

" ft 



48 
" k 



i—l i—l i — l i—l i—l 



32 / ° ° ° ° ° 

+ t I E x i E y i + E ^ E + ( s - 3 ) E x4y4 
\i=i i=i i=i i=i i=i 
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